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V 


The immediate goals of the research being conducted under NASA grant NAG 1-750 are two, namely 
to create data useful to the study of sotware reliability and to produce results pertinent to software relia- 
bility through the analysis of existing reliability models and data. The long term target of this research 
is to identify and or create a model for use in analyzing the reliability of flight control software. 

The data creation portion of this research to date consists of a GCS design document created by 
Wenhui Shen and work by Larry Wilson in the supervision of Shen and interfacing with the NASA and 
RTI experimenters. This will shortly lead to design and code reviews with the resulting product being 
one of the versions used in the Terminal Descent Experiment being conducted by the SVMB of 
NASA/Langley. Further efforts are being expended parallel to the NASA experiment, with the intent of 
producing more versions in undergraduate classes at ODU. This parallel work is being conducted by 
C. M. Overstreet at ODU, aided by Brenda Ellis. Brenda is supported by a Minority Student Research 
Grant, which is an auxiliary to this grant. 

The analysis at this time is being done by Wilson and Wenhui. This has resulted in a recent 
paper which has been submitted to the ODU CS review process for a technical report. It is expected to 
be approved any day now and a copy is enclosed, even though it is not yet officially a technical report. 
The title of the paper is ’Simulation Studies of Software Reliability Models’. This paper has also been 
submitted to the International Working Conference on Dependable Computing For Critical Applications 
in Santa Barbara on Aug 23 through 25,1989. Dennis Link has recently been added to the group work- 
ing on projects related to this grant. Dennis will begin by looking into new ways to exploit the residue 
from the previous experiment conducted by RTI for SVMB. This experiment involved three versions of 
the Launch Interceptor Program and the residue is perceived as being very valuable in ways which have 
not yet been exploited. Further it is expected that the experience we get in looking at this previous 
experiment will prove useful in preparing us to plan and analyze the data created by the Terminal Des- 
cent Experiment. 
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Abstract 


The Jclinski-Moranda and Geometric models for software reliability failed the consistency test 
which we proposed. We challenged these models to take data which comes from a process which they 
have correctly modeled and to make predictions about the reliability of that process. We found that 
either model, given data precisely from a process it correctly models, will usually fail to make good 
predictions. We attribute these problems to randomness in the data used as input to the models and 
indicate a remedy for this lack of robustness, namely replication of data. 


Additional Key Words and Phrases: Growth Models, Software Reliability, Simulation, and Replication 


o. Introduction 

The Jelinski-Moranda [3] and the Geometric [4] are famous and widely used models in the field 
of software reliability. These models assume the software being modeled is a Poisson Process with 
constant failure rate between two consecutive failures. Both use the sequence of interfailure times from 
the debugging process to make maximum likelihood estimates of parameters associated with the 
models. That is they predict the future performance of the software based on the data from the debug- 
ging process. 

The Jclinski-Moranda model assumes that there are N initial bugs in the software, that each has 
the common failure rate of <|>, and that the failure rate of the program is the number of bugs present 
multiplied by <]>. Thus if i - 1 bugs have been remove the failure rate is X.,= (N-i+1) * 4>. If n 
errors have been removed then interfailure times t i,?2» ' ' ' J n * iavc been generated and these may be 
inserted into the following likelihood equation which corresponds to this model. 

L(t h t 2 , • • • ,t n ;N ,<}>)=P[[<j>* (N -i+l)*e^ N ~ i+1 *‘] 

1=1 


This research was in part supported by NASA Grant 1-750. 
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The likelihood equation may be maximized by letting N = iVand <|> = <j> where N and 0 
form the solution of the following equations and arc used as estimators of N and <J> . 


<J>=' 


it 


N*2»rZ0'-l )h 

i= 1 /=! 
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/=i N — i+l 
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i= 1 i= 1 


The Geometric model assumes that the failure rale after removing i-1 bugs is 
Again the data of n interfailure times is inserted into the corresponding likelihood 

equation. 


LU h t 2 , ■ ■ • ,;„;a,P)=n[a*P‘-'*<!- a ' r "'‘] 

(=1 

A. A. 

This likelihood may be maximized by letting a = & and p = p where & and P form the 
solution to the following equations and arc used as estimators. 


* a; , 

/= l 


i=A 

X l(i- 

i = 1 


We show in following sections, that neither of these models is robust. That is if you were 
to debug the same program twice, generating two sequences of interfailure times then each 
model could give very different estimates for its parameters. 


i. Simulation of Interfailure Times 


Since these models each describe a Poisson Process with constant failure rate X,-, the pro- 

Thus we may obtain an intcr- 


bability that the next interfailure time is less than t is 1-e 
failure 
ing 


-X it 


re time by Generating an uniformly distributed random number r between 0 and 1 and solv- 
r=l-e~* i,i for ti [7]. 


2. Testing the Models 
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A. Jclinski-Moranda Model tests 

We assumed a piece of software which has its reliability correctly modeled by Jelinski 
Moranda, with parameters N and § fixed. Thus = and we used the simulation pro- 
cess to generate t j, decreased \\ by <f) to get and simulated to gel / 2 . After iterating n 
times we had data representing n inter failure times from the debugging process. 


The simulated interfailure limes were used as input to the model and N and (j) were cal- 
culated. The predicted values of N and (j) usually differed greatly from the seeded values and 
there were large variations amongst the predictions from different simulations for the same 
seeded values. Each of the following histograms was constructed by generating 128 sets of 
inlcrfailurc limes for each value of n with N fixed at 100 and <J> fixed at 0.001. Each x axis 
represents values of N . 
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Fig. 1 


Figure l.b shows that for N = 100, <)> = .001 , and n =„30, N falls between 95 and 
105 less than 5% of the time. Further for the same graph, N falls between 85 and 115 
approximately 10% of the lime. The other graphs tell a similar discouraging story. As 
expected, the best estimates arc given by the ease where n = 70, but even then only about 
55% of the estimates arc within 15 of 100. We must also point out that since 70 errors have 
been removed we arc actually only trying to estimate the remaining 30, thus our estimates arc 
off by 50% or more 45% of the lime. We conclude that the model is very sensitive to random 
variations in the input data even when the data is precisely what the model says it should be. 
Thus the Jclinski-Moranda model should not be used to make predictions about software based 
on the inlcrfailurc times of the debugging process. 







B. Geometric Model tests 
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In this case a and B were assigned arbitrarily but realistic values, a = 0.1, p = 0.8, and 
for each i, . Data was simulated using the changing A., values and the model was 

used to predict A, rt+1 , the reliability of the product after n bugs have been removed. Once 
again the results of 128 repetitions for each value of n arc represented by histograms. In these 
graphs the x-axis measures percentage error of the estimated value from the correct value of 

^ 71 + 1 - 
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In these histograms the 0% bar represents the proportion of the estimates which fall 
within plus or minus 10% of A n+1 , For n = 20, only (approximately) 12% of the estimates 
come within 10% of the desired value. Also for n = 20, only (approximately) 46% of the 
estimates come within plus or minus 30% of the correct value. This indicates that the 
Geometric Model also has trouble handling the variations in the data from one debugging to 
the next. 


3 Replicated testing 


Nagel [5,6] introduced the idea of replicated debugging, which we characterize as follows.. 
Given a piece of software, make r copies, debug each of the copies for a period of time, which 
generates r sets of replicated interfailure data. The overhead for approximating this process is 
less than one might anticipate since all but the normal debugging effort can be automated. 

If we repeat the tests for both models using interfailure data which is the average of r 
replications instead of from a single debugging we now sec that the models perform much 
better as r increases. Each of the histograms in figures 3 and 4 was constructed by generating 
128 sets of average interfailure times, each of which was formed by averaging the correspond- 
ing interfailure times from r replicates. These histograms when considered with those in the 
previous section indicate that the models require more than the normal debugging data in order 
to give good predictions and that replication offers a remedy. In particular, figure 3.d.l indi- 
cates that with 30 replicates the Jelinski-Moranda model with n = 70 gives estimates 

between 95 and 105 about 80% of the time and almost always gives estimates between 85 
and 115. This pattern holds throughout, we get better estimates by either model when we 
increase r. 
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4 Confidence Intervals in Predictions 


If wc wish to quantify our confidence in the predictions of the models, then wc can look 
at confidence intervals. Suppose wc wish to be 90% certain that our estimate is within 10% of 
the value of the parameter wc arc trying to predict. 

The following graph shows the (n,r) pairs which combine to give estimates within 10% of 
the value of N-n for the Jclinski-Moranda model with N = 100 and <|) = .001. It is based on 
2,500 repetitions for each (n,r) pair displayed. 



This graph shows that good predictions arc possible for simulated data with replication. It 
also shows that without replication one should not expect good predictions. 

A similar graph could be constructed for the Geometric Model. It too would support the 
need for replication and show that by increasing either n or r one can obtain better estimates. 
However, the Nagel experimental design generates replicated data efficiently by exploiting infor- 
mation discovered during the normal debugging process. She uses the failure information and 
fixes generated during debugging to automate the process of replication. Further, this could 
run in the background or in parallel with the debugging effort, and thus it should be less expen- 
sive in both time and money to increase r rather than n. 
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5 Summary 

These models should not be used in the real world to make predictions without replica- 
tion. This does not guarantee that the models with replication will give good estimates, since 
the goodness of fit problem has not been addressed here and previous efforts to validate these 
models have not used replicated data and hence are suspect. It is clear from this work that ran- 
dom chance is likely to dominate if one uses only the data from one replicate. It is also claimed 
that the field of software reliability has been hindered by the random nature of the data and 
that replication offers a solution to this problem by removing the randomness from the data. 
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